9 research outputs found

    Statistical distribution of common audio features : encounters in a heavy-tailed universe

    Get PDF
    In the last few years some Music Information Retrieval (MIR) researchers have spotted important drawbacks in applying standard successful-in-monophonic algorithms to polyphonic music classification and similarity assessment. Noticeably, these so called “Bag-of-Frames” (BoF) algorithms share a common set of assumptions. These assumptions are substantiated in the belief that the numerical descriptions extracted from short-time audio excerpts (or frames) are enough to capture relevant information for the task at hand, that these frame-based audio descriptors are time independent, and that descriptor frames are well described by Gaussian statistics. Thus, if we want to improve current BoF algorithms we could: i) improve current audio descriptors, ii) include temporal information within algorithms working with polyphonic music, and iii) study and characterize the real statistical properties of these frame-based audio descriptors. From a literature review, we have detected that many works focus on the first two improvements, but surprisingly, there is a lack of research in the third one. Therefore, in this thesis we analyze and characterize the statistical distribution of common audio descriptors of timbre, tonal and loudness information. Contrary to what is usually assumed, our work shows that the studied descriptors are heavy-tailed distributed and thus, they do not belong to a Gaussian universe. This new knowledge led us to propose new algorithms that show improvements over the BoF approach in current MIR tasks such as genre classification, instrument detection, and automatic tagging of music. Furthermore, we also address new MIR tasks such as measuring the temporal evolution of Western popular music. Finally, we highlight some promising paths for future audio-content MIR research that will inhabit a heavy-tailed universe.En el campo de la extracción de información musical o Music Information Retrieval (MIR), los algoritmos llamados Bag-of-Frames (BoF) han sido aplicados con éxito en la clasificación y evaluación de similitud de señales de audio monofónicas. Por otra parte, investigaciones recientes han señalado problemas importantes a la hora de aplicar dichos algoritmos a señales de música polifónica. Estos algoritmos suponen que las descripciones numéricas extraídas de los fragmentos de audio de corta duración (o frames ) son capaces de capturar la información necesaria para la realización de las tareas planteadas, que el orden temporal de estos fragmentos de audio es irrelevante y que las descripciones extraídas de los segmentos de audio pueden ser correctamente descritas usando estadísticas Gaussianas. Por lo tanto, si se pretende mejorar los algoritmos BoF actuales se podría intentar: i) mejorar los descriptores de audio, ii) incluir información temporal en los algoritmos que trabajan con música polifónica y iii) estudiar y caracterizar las propiedades estadísticas reales de los descriptores de audio. La bibliografía actual sobre el tema refleja la existencia de un número considerable de trabajos centrados en las dos primeras opciones de mejora, pero sorprendentemente, hay una carencia de trabajos de investigación focalizados en la tercera opción. Por lo tanto, esta tesis se centra en el análisis y caracterización de la distribución estadística de descriptores de audio comúnmente utilizados para representar información tímbrica, tonal y de volumen. Al contrario de lo que se asume habitualmente, nuestro trabajo muestra que los descriptores de audio estudiados se distribuyen de acuerdo a una distribución de “cola pesada” y por lo tanto no pertenecen a un universo Gaussiano. Este descubrimiento nos permite proponer nuevos algoritmos que evidencian mejoras importantes sobre los algoritmos BoF actualmente utilizados en diversas tareas de MIR tales como clasificación de género, detección de instrumentos musicales y etiquetado automático de música. También nos permite proponer nuevas tareas tales como la medición de la evolución temporal de la música popular occidental. Finalmente, presentamos algunas prometedoras líneas de investigación para tareas de MIR ubicadas, a partir de ahora, en un universo de “cola pesada”.En l’àmbit de la extracció de la informació musical o Music Information Retrieval (MIR), els algorismes anomenats Bag-of-Frames (BoF) han estat aplicats amb èxit en la classificació i avaluació de similitud entre senyals monofòniques. D’altra banda, investigacions recents han assenyalat importants inconvenients a l’hora d’aplicar aquests mateixos algorismes en senyals de música polifònica. Aquests algorismes BoF suposen que les descripcions numèriques extretes dels fragments d’àudio de curta durada (frames) son suficients per capturar la informació rellevant per als algorismes, que els descriptors basats en els fragments son independents del temps i que l’estadística Gaussiana descriu correctament aquests descriptors. Per a millorar els algorismes BoF actuals doncs, es poden i) millorar els descriptors, ii) incorporar informació temporal dins els algorismes que treballen amb música polifònica i iii) estudiar i caracteritzar les propietats estadístiques reals d’aquests descriptors basats en fragments d’àudio. Sorprenentment, de la revisió bibliogràfica es desprèn que la majoria d’investigacions s’han centrat en els dos primers punts de millora mentre que hi ha una mancança quant a la recerca en l’àmbit del tercer punt. És per això que en aquesta tesi, s’analitza i caracteritza la distribució estadística dels descriptors més comuns de timbre, to i volum. El nostre treball mostra que contràriament al què s’assumeix, els descriptors no pertanyen a l’univers Gaussià sinó que es distribueixen segons una distribució de “cua pesada”. Aquest descobriment ens permet proposar nous algorismes que evidencien millores importants sobre els algorismes BoF utilitzats actualment en diferents tasques com la classificació del gènere, la detecció d’instruments musicals i l’etiquetatge automàtic de música. Ens permet també proposar noves tasques com la mesura de l’evolució temporal de la música popular occidental. Finalment, presentem algunes prometedores línies d’investigació per a tasques de MIR ubicades a partir d’ara en un univers de “cua pesada”

    Music perception with current signal processing strategies for cochlear implants

    No full text
    This work presents a brief review on hearing with cochlear implants with emphasis on music perception. Although speech perception in noise with cochlear implants is still the major challenge, music perception is becoming more and more important. Music can modulate emotions and stimulate the brain in different ways than speech, for this reason, music can impact in quality of life for cochlear implant users. In this paper we present traditional and new trends to improve the perception of pitch with cochlear implants as well as some signal processing methods that have been designed with the aim to improve music perception. Finally, a review of music evaluation methods will be presented

    Music perception with current signal processing strategies for cochlear implants

    No full text
    This work presents a brief review on hearing with cochlear implants with emphasis on music perception. Although speech perception in noise with cochlear implants is still the major challenge, music perception is becoming more and more important. Music can modulate emotions and stimulate the brain in different ways than speech, for this reason, music can impact in quality of life for cochlear implant users. In this paper we present traditional and new trends to improve the perception of pitch with cochlear implants as well as some signal processing methods that have been designed with the aim to improve music perception. Finally, a review of music evaluation methods will be presented

    Sound object classification for symbolic audio mosaicing: a proof-of-concept

    No full text
    Comunicació presentada a la 6th Sound and Music Computing Conference, celebrada els dies 23 a 25 de juliol de 2009 a Porto, Portugal.Sample-based music composition often involves the task of manually searching appropriate samples from existing audio. Audio mosaicing can be regarded as a way to automatize this process by specifying the desired audio attributes, so that sound snippets that match these attributes are concatenated in a synthesis engine. These attributes are typically derived from a target audio sequence, which might limit the musical control of the user. In our approach, we replace the target audio sequence by a symbolic sequence constructed with pre-defined sound object categories. These sound objects are extracted by means of automatic classification techniques. Three steps are involved in the sound object extraction process: supervised training, automatic classification and user-assisted selection. Two sound object categories are considered: percussive and noisy. We present an analysis/synthesis framework, where the user explores first a song collection using symbolic concepts to create a set of sound objects. Then, the selected sound objects are used in a performance environment based on a loop-sequencer paradigm.This work is partially funded by Yamaha Corp., Japan and the EU IST project Salero FP6-027122

    Measuring the evolution of contemporary western popular music

    No full text
    Popular music is a key cultural expression that has captured listeners' attention for ages. Many of the structural regularities underlying musical discourse are yet to be discovered and, accordingly, their historical evolution remains formally unknown. Here we unveil a number of patterns and metrics characterizing the generic usage of primary musical facets such as pitch, timbre, and loudness in contemporary western popular music. Many of these patterns and metrics have been consistently stable for a period of more than fifty years. However, we prove important changes or trends related to the restriction of pitch transitions, the homogenization of the timbral palette, and the growing loudness levels. This suggests that our perception of the new would be rooted on these changing characteristics. Hence, an old tune could perfectly sound novel and fashionable, provided that it consisted of common harmonic progressions, changed the instrumentation, and increased the average loudness.This work was partially supported by Catalan Government grants 2009-SGR-164 (A.C.), 2009-SGR-1434 (J.S. and J.Ll.A.), 2009-SGR-838 (M.B.) and ICREA Academia Prize 2010 (M.B.), European Comission grant FP7-ICT-2011.1.5-287711 (M.H.), Spanish Government grants FIS2009-09508 (A.C.), FIS2010-21781-C02-02 (M.B.) and TIN2009-13692-C03-01 (J.Ll.A.), and Spanish National Research Council grant JAEDOC069/2010 (J.S.). The authors would like to thank the million song dataset team for making this massive source of data publicly available

    Zipf’s law in short-time timbral codings of speech, music, and environmental sound signals

    No full text
    Timbre is a key perceptual feature that allows discrimination between different sounds. Timbral sensations are highly dependent on the temporal evolution of the power spectrum of an audio signal. In order to quantitatively characterize such sensations, the shape of the power spectrum has to be encoded in a way that preserves certain physical and perceptual properties. Therefore, it is common practice to encode short-time power spectra using psychoacoustical frequency scales. In this paper, we study and characterize the statistical properties of such encodings, here called timbral code-words. In particular, we report on rank-frequency distributions of timbral code-words extracted from 740 hours of audio coming from disparate sources such as speech, music, and environmental sounds. Analogously to text corpora, we find a heavy-tailed Zipfian distribution with exponent close to one. Importantly, this distribution is found independently of different encoding decisions and regardless of the audio source. Further analysis on the intrinsic characteristics of most and least frequent code-words reveals that the most frequent code-words tend to have a more homogeneous structure. We also find that speech and music databases have specific, distinctive code-words while, in the case of the environmental sounds, this database-specific code-words are not present. Finally, we find that a Yule-Simon process with memory provides a reasonable quantitative approximation for our data, suggesting the existence of a common simple generative mechanism for all considered sound sources.Funding was received from Classical Planet: TSI-070100- 2009-407 (MITYC), www.mityc.es; DRIMS: TIN2009-14247-C02-01 (MICINN), www.micinn.es; FIS2009-09508, www.micinn.es; and 2009SGR-164, www.gencat.cat. JS acknowledges funding from Consejo Superior de Investigaciones Científicas (JAEDOC069/2010), www.csic.es; and Generalitat de Catalunya (2009-SGR-1434), www.gencat.cat

    Sound object classification for symbolic audio mosaicing: a proof-of-concept

    No full text
    Comunicació presentada a la 6th Sound and Music Computing Conference, celebrada els dies 23 a 25 de juliol de 2009 a Porto, Portugal.Sample-based music composition often involves the task of manually searching appropriate samples from existing audio. Audio mosaicing can be regarded as a way to automatize this process by specifying the desired audio attributes, so that sound snippets that match these attributes are concatenated in a synthesis engine. These attributes are typically derived from a target audio sequence, which might limit the musical control of the user. In our approach, we replace the target audio sequence by a symbolic sequence constructed with pre-defined sound object categories. These sound objects are extracted by means of automatic classification techniques. Three steps are involved in the sound object extraction process: supervised training, automatic classification and user-assisted selection. Two sound object categories are considered: percussive and noisy. We present an analysis/synthesis framework, where the user explores first a song collection using symbolic concepts to create a set of sound objects. Then, the selected sound objects are used in a performance environment based on a loop-sequencer paradigm.This work is partially funded by Yamaha Corp., Japan and the EU IST project Salero FP6-027122

    Content-based music recommendation based on user preference examples

    No full text
    Comunicació presentada a: Workshop on Music Recommendation and Discovery 2010 (WOMRAD 2010) celebrat el 26 de setembre de 2010 a Barcelona, Espanya.Recommending relevant and novel music to a user is one of the central applied problems in music information research. In the present work we propose three content-based approaches to this task. Starting from an explicit set of music tracks provided by the user as evidence of his/her music preferences, we infer high-level semantic descriptors, covering different musical facets, such as genre, culture, moods, instruments, rhythm, and tempo. On this basis, two of the proposed approaches employ a semantic music similarity measure to generate recommendations. The third approach creates a probabilistic model of the user’s preference in the semantic domain. We evaluate these approaches against two recommenders using state-of-the-art timbral features, and two contextual baselines, one exploiting simple genre categories, the other using similarity information obtained from collaborative filtering. We conduct a listening experiment to assess familiarity, liking and further listening intentions for the provided recommendations. According to the obtained results, we found our semantic approaches to outperform the low-level timbral baselines together with the genre-based recommender. Though the proposed approaches could not reach a performance comparable to the involved collaborative filtering system, they yielded acceptable results in terms of successful novel recommendations. We conclude that the proposed semantic approaches are suitable for music discovery especially in the long tail.This research has been partially funded by the FI Grant of Generalitat de Catalunya (AGAUR)

    A Content-based system for music recommendation and visualization of user preferences working on semantic notions

    No full text
    Comunicació presentada a: 9th International Workshop on Content-Based Multimedia Indexing (CBMI) celebrat del 13 al 15 d'octubre de 2011 a Madrid, Espanya.The amount of digital music has grown unprecedentedly during the last years and requires the development of effective methods for search and retrieval. In particular, contentbased preference elicitation for music recommendation is a challenging problem that is effectively addressed in this paper. We present a system which automatically generates recommendations and visualizes a user’s musical preferences, given her/his accounts on popular online music services. Using these services, the system retrieves a set of tracks preferred by a user, and further computes a semantic description of musical preferences based on raw audio information. For the audio analysis we used the capabilities of the Canoris API. Thereafter, the system generates music recommendations, using a semantic music similarity measure, and a user’s preference visualization, mapping semantic descriptors to visual elements.This research has been partially funded by the FI Grant of Generalitat de Catalunya (AGAUR)
    corecore